1 GCCCTTGGCA GCAGCCCTGT TACCGCTTAG ATGGCGCGCA GGACAGAGCC 

51 CCCCGACGGG GGCTGGGGAC GGGTGGTGGT GCTCTCAGCG TTCTTCCAGT 

101 CGGCGCTTGT GTTTGGGGTG CTCCGCTCCT TTGGGGTCTT CTTCGTGGAG 

151 TTTGTGGCGG CGTTTGAGGA GCAGGCAGCG CGCGTCTCCT GGATCGCCTC 

201 CATAGGAATC GCGGTGCAGC AGTTTGGGAG CCCGGTAGGC AGTGCCCTGA 

251 GCACGAAGTT CGGGCCCAGG CCCGTGGTGA TGACTGGAGG CATCTTGGCT 

301 GCGCTGGGGA TGCTGCTCGC CTCTTTTGCT ACTTCCTTGA CCCACCTATA 

351 CCTGAGTATT GGGTTGCTGT CAGGCTCTGG CTGGGCTTTG ACCTTCGCTC 

401 CGACCCTGGC CTGCCTGTCC TGTTATTTCT CTCGCCGACG ATCCCTGGCC 

451 ACCGGGCTGG CACTGACAGG CGTGGGCCTC TCCTCCTTCA CATTTGCCCC 

501 CTTTTTCCAG TGGCTGCTCA GCCACTACGC CTGGAGGGGG TCCCTGCTGC 

551 TGGTGTCTGC TCTCTCCCTC CACCTAGTGG CCTGTGGTGC TCTCCTCCGC 

601 CCACCCTCCC TGGCTGAGGA CCCTGCTGTG GGTGGTCCCA GGGCCCAACT 

651 CACCTCTCTC CTCCATCATG GCCCCTTCCT CCGTTACACT GTTGCCCTCA 

701 CCCTGATCAA CACTGGCTAC TTCATTCCCT ACCTCCACCT GGTGGCCCAT 

751 CTCCAGGACC TGGATTGGGA CCCACTACCT GCCGCCTTCC TACTCTCAGT 

801 TGTTGCTATT TCTGACCTCG TGGGGCGTGT GGTCTCCGGA TGGCTGGGAG 

851 ATGCAGTCCC AGGGCCTGTG ACACGACTCC TGATGCTCTG GACCACCTTG 

901 ACTGGGGTGT CACTAGCCCT GTTCCCTGTA GCTCAGGCTC CCACAGCCCT 

951 GGTGGCTCTG GCTGTGGCCT ACGGCTTCAC AT CAGGGGCT CTGGCCCCAC 

1001 TGGCCTTCTC TGTGCTGCCT GAACTAATAG GGACTAGAAG GATTTACTGT 

1051 GGCCTGGGAC TGTTGCAGAT GATAGAGAGC ATCGGGGGGC TGCTGGGGCC 

1101 TCCTCTCTCA GGCTACCTCC GGGATGTGTC AGGCAACTAC ACGGCTTCTT 

1151 TTGTGGTGGC TGGGGCCTTC CTTCTTTCAG GGAGTGGCAT TCTCCTCACC 

1201 CTGCCCCACT TCTTCTGCTT CTCAACTACT ACCTCCGGGC CTCAGGACCT 

1251 TGTAACAGAA GC AC TAG AT A CTAAAGTTCC CCTACCCAAG GAGGGGCTGG 

1301 AAGGAGGACT GAACTCCACA GAGTCAGGCC CAGAAAGCCA AAGCTTGACA 

1351 GCTCCAGGTC TTCTCTTGCC ACGTCTTGGT CTCCACAGAA CCACAGTGCC 

1401 TTAAGATTCT TGATCTGCCT CCCCCTAGAG CAGGCCTGGG GCTCCTGCAA 

1451 TGTGTGTGCC AACCCTTT (SEQ ID NO:l) 



FEATURES : 

5 ? UTR: 1-30 

Start Codon: 31 

Stop Codon: 1402 

3 T UTR : 1405 
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HOMOLOGOUS PROTEINS : 








Ton 10 BLAST Hits* 










Score 




E 


CRA| 103000001515981 /altid=gi ] 7670446 /def=dbj | BAA95074 . 1 1 (ABO.. 


250 


3e- 


-65 


CRAi 150000165029756 /altid=gi i 13431 667 /def=sp i 070461 i MOT3 RAT .. 


244 


le 


-63 


CRA| 89000000192725 /altid=gi I 10048452 /def =ref 1 NP_065262 . 1 I sol.. 


238 


8e 


-62 


CRAI18000005042369 /altid=gi I 2497855 / def=sp I Q63344 I MOT2_RAT MO. . 


238 


le 


-61 


CR A 1 1 8000005039313 /altid=ai 1 1432167 /def =ab 1 AAB04023 . 1 1 (U6231.. 


238 


le 


-61 


PR Al 1 80000051 41 743 /alt id=ai I 6755536 /def=reflNP 035521.11 solu. . 


234 


2e 


-60 


CRA| 335001098681302 /altid-gi i 11418102 /def-ref I XP__009979 . 1 | mo.. 


234 


2e 


-60 


CRA! 1000682335761 /altid-gi i 7019529 /def =ref 1 NP_037488 . 1 i monoc. . 


233 


5e 


-60 


CRAj 18000005141744 /altid-gi I 4759120 /def=ref | NP_004722 . 1 | solu.. 


232 


6e 


-60 


CRA! 108000024650708 /altid=gi ! 12737028 /def=ref 1XP_012127 . 1 | so.. 


232 


6e 


-60 


BLAST dbEST hits: 










Score 




E 


gi 18423571 /dataset=dbest /taxon-960 . . . 


733 


0. 


0 



EXPRESSION INFORMATION FOR MODULATORY USE: 

library source: 

From BLAST dbEST hits: 

gi 18423571 breast 

From tissue screening panels: 
Spleen 

Breast (adult) 
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1 MARRTEPPDG GWGRVVVLSA FFQSALVFGV LRSFGVFFVE FVAAFEEQAA 
51 RVSWIASIGI AVQQFGSPVG SALSTKFGPR PVVMTGGILA ALGMLLASFA 
101 TSLTHLYLSI GLLSGSGWAL TFAPTLACLS CYFSRRRSLA TGLALTGVGL 
151 SSFTFAPFFQ WLLSHYAWRG SLLLVSALSL HLVACGALLR PPSLAEDPAV 
201 GGPRAQLTSL LHHGPFLRYT VALTLINTGY FIPYLHLVAH LQDLDWDPLP 
251 AAFLLSVVAI SDLVGRVVSG WLGDAVPGPV TRLLMLWTTL TGVSLALFPV 
301 AQAPTALVAL AVAYGFTSGA LAPLAFSVLP ELIGTRRIYC GLGLLQMIES 
351 IGGLLGPPLS GYLRDVSGNY TAS FVVAGAF LLSGSGILLT LPHFFCFSTT 
401 TSGPQDLVTE ALDTKVPLPK EGLEGGLNST ESGPESQSLT APGLLLPRLG 
451 LHRTTVP (SEQ ID NO: 2) 



FEATURES : 

Functional domains and key regions : 

[1] PDOC00001 PS00001 ASN_G LYCOS YLAT ION 
N-glycosylation site 

Number of matches: 2 

1 369-372 NYTA 

2 428-431 NSTE 

[2] PDOC00004 PS00004 CAMP_PHOSPHO_SITE 

cAMP- and cGMP-dependent protein kinase phosphorylation site 

135-138 RRRS 

[3] PDOC00005 PS00005 PKC_PHOSPHO_SITE 
Protein kinase C phosphorylation site 

Number of matches: 3 

1 74-76 STK 

2 134-136 SRR 

3 335-337 TRR 

[4] PDOC00006 PS00006 CK2_PHOSPHO_SITE 
Casein kinase II phosphorylation site 

Number of matches: 2 

1 193-196 SLAE 

2 432-435 SGPE 

[5] PDOC00008 PS00008 MYRISTYL 
N-myristoylation site 

Number of matches: 18 



1 


29-34 


GVLRSF 


2 


66-71 


GSPVGS 


3 


70-75 


GSALST 


4 


86-91 


GGILAA 


5 


87-92 


GILAAL 


6 


93-98 


GMLLAS 


7 


111-116 


GLLSGS 


8 


115-120 


GSGWAL 


9 


142-147 


GLALTG 


10 


147-152 


GVGLSS 


11 


201-206 


GGPRAQ 


12 


292-297 


GVSLAL 


13 


368-373 


GNYTAS 


14 


386-391 


GILLTL 


15 


422-427 


GLEGGL 


16 


425-430 


GGLNST 


17 


426-431 


GLNSTE 


18 


450-455 


GLHRTT 
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Membrane spanning structure and domains: 



lix 


Begin 


End 


Score 


Certainty 


1 


13 


33 


1. 


.302 


Certain 


2 


52 


72 


1. 


.039 


Certain 


3 


81 


101 


2. 


,101 


Certain 


4 


114 


134 


1. 


.703 


Certain 


5 


139 


159 


1. 


,850 


Certain 


6 


170 


190 


1. 


,572 


Certain 


7 


219 


239 


1. 


.192 


Certain 


8 


245 


265 


1. 


.019 


Certain 


9 


283 


303 


1. 


.832 


Certain 


10 


306 


326 


1. 


.709 


Certain 


11 


338 


358 


0, 


.976 


Putative 


12 


372 


392 


1, 


.982 


Certain 
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BLAST Alignment to Top Hit: 

>CRAi 150000165029756 /altid-gi ! 13431667 / def =sp | 07 04 61 | MOT3_RAT 
MONOCARBOXYLATE TRANSPORTER 3 (MCT 3) /org=MCT 3 
/dataset=nraa /lengths 92 
Length = 492 

Score = 244 bits (617), Expect = le-63 

Identities = 168/470 (35%), Positives = 239/470 (50%), Gaps = 36/470 (7%) 

Query: 3 RRTEPPDGGWGRVVVLSAFFQSALVFGVLRSFGVFFVEFVAAFEEQAARVSWIASIGIAV 62 

R PPDGGWG VV+ + F + +G + + VFF E F + +W++SI +A+ 

Sbjct: 8 RGAGPPDGGWGWWLGACFVITGFAYGFPKAVSVFFRELKRDFGAGYSDTAWVSSIMLAM 67 

Query: 63 QQFGSPVGSALSTKFGPRPVVMTGGILAALGMLLASFATSLTHLYLSIGLLSGSGWALTF 122 

P+ S L T+FG RPV++ GG+LA+ GM+LASFA+ L LYL+ G+L+G G AL F 
Sbjct : 68 LYGTGPLSSILVTRFGCRPVMLAGGLLASAGMILASFASRLLELYLTAGVLTGLGLALNF 127 

Query: 123 APTLACLSCYFSRRRSLATGLALTGVGLSSFTFAPFFQWLLSHYAWRGSLLLVSALSLHL 182 

P+L L YF RRR LA GLA G + T +P Q L + WRG LL L LH 
Sbjct: 128 QPSLIMLGLYFERRRPLANGLAAAGSPVFLSTLSPLGQLLGERFGWRGGFLLFGGLLLHC 187 

Query: 183 VACGALLRPPSLAE DPAVGGPRAQLTSLLH HGPFLRYTVALTLINTGYFIPY 234 

ACGA++RPP + DPA G RA+ LL F+ Y V L+ G F+P 

Sbjct: 188 CACGAVMRPPPGPQPRPDPAPPGGRARHRQLLDLAVCTDRTFMVYMVTKFLMALGLFVPA 247 

Query: 235 LHLVAHLQDLDWDPLPAAFLLSVVAISDLVGRVVSGWLG--DAVPGPVTRLLMLWTTLTG 292 

+ LV + +D AAFLLS+V D+V RGL + V L L G 

Sbjct: 248 ILLVNYAKDAGVPDAEAAFLLSIVGFVDIVARPACGALAGLGRLRPHVPYLFSLALLANG 307 

Query: 293 VSLALFPVAQAPTALVALAVAYGFTSGALAPLAFSVLPELIGTRRIYCGLGLLQMIESIG 352 

++ + A++ LVA +A+G + G + L F VL +G R LGL+ ++E++ 
Sbjct: 308 LTDLISARARSYGTLVAFCIAFGLSYGMVGALQFEVLMATVGAPRFPSALGLVLLVEAVA 367 

Query: 353 GLLGPPLSGYLRDVSGNYTASFWAGAFLLSGSGILLTLPHFFCFSTT 400 

L+GPP +G L D NY F +AG+ ++ +G+ + + + C + 
Sbjct: 368 VLIGPPSAGRLVDALKNYEIIFYLAGS-EVALAGVFMAVTTYCCLRCSKNISSGRSAEGG 426 

Query: 401 TSGPQDLVTEALDTKVPLPKEGLEGGLNSTESGPESQSLTAPGLLLPRLG 450 
S P+D+ EA P+P STE E SL A + L PR G 

Sbjct: 427 ASDPEDV— EAERDSEPMPA STE EPGSLEALEVLSPRAG 463 (SEQ ID 

NO: 4) 



>CRA| 89000000192725 /altid-gi | 1004 84 52 /def=ref I NP_0652 62 . 1 | solute 
carrier family 16 (monocarboxylic acid transporters), 
member 8; proton-coupled monocarboxylate transporter 3 
gene; proton-coupled monocarboxylate transporter 3 [Mus 
musculus] /org=Mus musculus /taxon=10090 /dataset=nraa 
/length=4 92 
Length - 4 92 

Score - 238 bits (602), Expect = 8e-62 

Identities = 165/470 (35%), Positives - 236/470 (50%), Gaps = 36/470 (7%) 

Query: 3 RRTEPPDGGWGRVVVLSAFFQSALVFGVLRSFGVFFVEFVAAFEEQAARVSWIASIGIAV 62 

R PPDGGWG VV+ + F + +G ++ VFF E F + +W++SI +A+ 
Sbjct : 8 RGAGPPDGGWGWWLGACFVVTGFAYGFPKAVSVFFRELKRDFGAGYSDTAWVSSIMLAM 67 

Query: 63 QQFGSPVGSALSTKFGPRPWMTGGILAALGMLLASFATSLTHLYLSIGLLSGSGWALTF 122 

P+ S L T+FG RPV++ GG+LA+ GM+LASFA+ L LYL+ G+L+G G AL F 
Sbjct : 68 LYGTGPLSSILVTRFGCRPVMLAGGLLASAGMILASFASRLVELYLTAGVLTGLGLALNF 127 
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Query: 123 APTLACLSCYFSRRRSLATGLALTGVGLSSFT FAPFFQWLLSHYAWRGSLLLVSALSLHL 182 

P+L L YF RRR LA GLA G + +P Q L + WRG LL L LH 

Sbjct: 128 QPSLIMLGLYFERRRPLANGLAAAGSPVFLSMLSPLGQLLGERFGWRGGFLLFGGLLLHC 187 



Query: 183 VACGALLRP PSLAEDPAVGGPRAQLTSLLH HGPFLRYTVALTLINTGYFIPY 234 

ACGA++RP P DP+ G A+ LL F+ Y V L+ G F+P 

Sbjct: 188 CACGAVMRPPPGPPPRRDPSPHGGPARRRRLLDVAVCTDRAFVVYVVTKFLMALGLFVPA 247 

Query: 235 LHLVAHLQDLDWDPLPAAFLLSVVAISDLVGRVVSGWLG — DAVPGPVTRLLMLWTTLTG 292 

+ LV + +D AAFLLS+V D+V RGL +VLL G 

Sbjct: 248 I LLVN YAKDAGVP DAEAAFLL S I VG FVD I VAR PACGALAGLGRLRPHVP YL FS LALLANG 307 

Query: 2 93 VS LAL FPVAQAPT ALVALAVAYG FT S GALAPLAFS VL PE L I GTRRI YCGLGLLQM I E S I G 352 

++ + A++ LVA +A+G U + L F VL +G R LGL+ ++E++ 
Sbjct: 308 LTDLISARARSYGTLVAFCIAFGLSYGMVGALQFEVLMATVGAPRFPSALGLVLLVEAVA 367 

Query: 353 GLLGPPLSGYLRDVSGNYTASFWAGAFLLSGSGILLTLPHFFCFSTT 400 

L+GPP +G L D NY F +AG+ ++ +G+ + + + C + 
Sbjct: 368 VLIGPPSAGRLVDALKNYEIIFYLAGS-EVALAGVFMAVTTYCCLRCSKNISSGRSAEGG 426 

Query: 401 TSGPQDLVTEALDTKVPLPKEGLEGGLNSTESGPESQSLTAPGLLLPRLG 450 
S P+D+ EA P+P STE E SL A +L PR G 

Sbjct: 427 ASDPEDV — EAERDSEPMPA STE EPGSLEALEVLSPRAG 463 (SEQ ID 

NO:5) 



Hmmer search results 

Model Description 



(Pfam) 



PF01587 Monocarboxylate transporter 

PF01925 Domain of unknown function 

PF00348 Polyprenyl synthetases 

PF00083 Sugar (and other) transporter 

PF01306 LacY proton/sugar symporter 

PF01309 Equine arteritis virus small envelope glycop 



Score 


E- 


-value 


N 


204.9 


1 


.2e-57 


2 


4.4 




4.6 


1 


3.7 




6.1 


1 


3.0 




3.8 


1 


2.7 




6.6 


1 


2.3 




5 


1 



Parsed for domains: 



Model 


Domain 


seq-f 


seq-t 


hmm-f 


hmm-t 


score 


E- 


-value 


PF01925 


1/1 


65 


97 . 


165 


201 .] 


4.4 




4.6 


PF00083 


1/1 


12 


108 . 


1 


113 [. 


3.0 




3.8 


PF01309 


1/1 


153 


173 . 


1 


21 [. 


2.3 




5 


PF00348 


1/1 


174 


191 . 


1 


19 [. 


3.7 




6.1 


PF01587 


1/2 


20 


192 . 


1 


191 [. 


160.8 


2 


3e-44 


PF01587 


2/2 


219 


377 . 


441 


611 .] 


48.3 


1 


6e-12 


PF01306 


1/1 


373 


393 . 


393 


415 . } 


2.7 




6. 6 
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1 CATTTTTAGT GCATGGATTT TCTAACTGAA CCCCTTGGGC AACGCTTAAT 
51 AGTAGGTACT ATTATCCCCA GTTTACAGAT GGGGAAACCA ACTGAGAGAT 
101 TCAGCATCTT GATCGAGTTA AGTAATAAAG TCAAGATTGG AACTGGGCCA 
151 GGCACGGTGG CTCACGCCTG TAATCCCAGC ACTTTGGGAG GCCAAGGCTG 
201 GTGGATCACT TGAGGTCAGG AGTTCGAGAC CAGCGTGGCC AACATGGTGA 
251 GACCTCGTCT CTACTAAAAA TACCAAAATT AACTGGGCGT TGTGGTGGGA 
301 GCCTGTAATC CCAGAAACTC AGGAGACTGA GGCAGGAGAA TCACTTGAAC 
351 CCGGGAGGTG GAGGTTGCAG TGAGCCAAGA TCATGCCACT GCACTCCAGC 
401 CTGGGCCACA GAGCAAGACT CCGTCTCAAA ATAAATAAAT AAATAAATAA 
451 ATAAATAAAA GACTGGAACT GTGATCTGAT TCTAAAGACC CGAGTTCTTA 
501 ATCACTATGT AATACAGCCA CAGCAATTTC TGTATCTTTG GCATATTCCC 
551 CACCAGCCGA CATTTTGACT CTTAGAAAGT ATATATGTGT AT TAT T GAT G 
601 ATTACTTTTA TTTCCCACAT ATAAAATTAT TTAAGGCTCA ATATGTCTTT 
651 TAAGACTGCA CACCTCCCTC CCTGCCTCCA CTTCTTGTTT GCTGCTTTCC 
701 CCAGTAATCT GGGAGTGAAC ATTGAGTCCA CGGTTTCAAG GTCAGGGTCC 
751 TGGGAAGTAT GGCTTATAAT GAAGGAACAG GAAATCCAAG CCATTGGTGT 
801 TAT G GAG ACT GGGAAGGACT GGGGAGTGTT TGCTAGGGGC CTGAGGACTA 
851 CTTGGGTAAG AGGGGGCTGA CTGCTCCAGT GGCCAGGGTC ATAGTTTTGT 
901 CTCTTTAGTC TACCCCACCA TCAGATCAAA AAAGGTGGTT AGGAAGTGGT 
951 TGTTACTAGA GGGCAGAGGA AAAGGTTCCA GCCCCAGTGA GGAAGAGGTA 
1001 GGTGGTGTTG GTGGGGCCCT GTGTGAGCTT ACAGCCGCCC TTCCTCTCCT 
1051 CAGTTATTTT TGGTCTCTGT GACCTGTAGG TTTCCTGTTA GTGGGAACAG 
1101 AAGTGACAGG AACGAGTTCC CACTACAGAA ATGAACGCCA GGAGTCCAAC 
1151 TCATTCCCCT TCTCTCTTCC CTTAGCCGTT GAACTTCTCA GGGATCCAGG 
1201 CTTCTAGGTC TGCGTGCCTA GGGCTGCGTG TTAGTGGCTT CAGGCGCTGC 
1251 GCCAAACACT TCGTTTGAGT CTCATCTCCT AACCCCTCCC CTACCCCCAA 
1301 CAGGGCCTTG CAATTCCTGG ACCCCTCATT AAAGCAAGAG AGTCCTCTCC 
1351 TCTCCAGACC CAGTTTACCC ACCACTAACC CTTCCGTGTG GCTCTGGGTG 
14 01 CTGAAACGGG GATGACTTGG CCCGCTAGGT GAAGAGGAGA CGGAAGCTTC 
14 51 CTGGCAGTCC CCGCGTCACG TGGGGCCCTA CCTAGTCAGC CTCCTAACGC 
1501 CCCTCCTTAC GCATGCGCCC ATTCACTGCT GGTCCCCAAC AATGCCTAAA 
1551 TCCCGCCCTG CCCTTCTCGT TCCGCCCCTG CCCGGGAGCC CCGCGTCCTC 
1601 ATTGGCGAGC TCCAGGGTGG CCCGGCCCGG ACACCCCAGT GATAAAATAG 
1651 ATCATCTACA CGGAAACTGG CGCGCTCCAG GGGTGGGGCC CAAACTCAGT 
1701 TCCACCCTCT GGCTCCCAGC CGAACACCGA ACCGGGACCG ATCCGGCCCC 
1751 GGCTTGAACT AGCTCAGCTC CGAGCTCGCG GAACCACGCC CCCGGGAGAC 
1801 TCTGGCCCGG CCAGCGCGGG CCAGGTCTTC AGTCCTATAT CGCCCTGCCT 
1851 TGGGAAAAGG TGCAGGGGCC TCTCGCCGCC TCGTCGGGCC CTTCCTCTCT 
1901 ACCTGCCTCT CCAACCCCTC TCGGCCCCGA GCCACCCGGC AGCGGGGGTG 
1951 GGTGTGCAGA GGTGCGGCGT CCAGAACCCG GCTCCTGCAG AGGCTCTGGG 
2001 TGGCAGCAGC CCTGTTACCG CTTAGATGGC GCGCAGGACA GAGCCCCCCG 
2051 ACGGGGGCTG GGGATGGGTG GTGGTGCTCT CAGCGTTCTT CCAGTCGGCG 
2101 CTTGTGTTTG GGGTGCTCCG CTCCTTTGGG GTCTTCTTCG TGGAGTTTGT 
2151 GGCGGCGTTT GAGGAGCAGG CAGCGCGCGT CTCCTGGATC GCCTCCATAG 
2201 GAATCGCGGT GCAGCAGTTT GGGAGTGAGT GCGGCGCCTG GATCTGGCGG 
2251 ACTGCGACCC TCGGAAGGGA GAGGGAATGC GGCGACTGGG AAGTGGAAGG 
2301 GCGAGGGGCG GGAGATGCTG GGGGGGAGAC CCCTGAGATC TTCTCGCAGC 
2351 GCCCCTTCCA CTTCCTCAGG CCCGGTAGGC AGTGCCCTGA GCACGAAGTT 
2 4 01 CGGGCCCAGG CCCGTGGTGA TGACTGGAGG CATCTTGGCT GCGCTGGGGA 
2451 TGCTGCTCGC CTCTTTTGCT ACTTCCTTGA CCCACCTATA CCTGAGTATT 
2501 GGGTTGCTGT CAGGTGAGAG CCTGCACAAG GGCAGGAGAG TCAAATGCTT 
2 551 AGATCGTTGG ATGTTCACCT CCTTCCTGCT CCTTCCAAAG GGTTCGGGGA 
2 601 GAAGCTGAGG GAAAGTTTAG CTAGCACCTG TACCCAGAAG GGAATTCTTA 
2 651 ATAGGAATGA CTAAAGCGAC AAACATGGTG AGGAATTAGG AAATTCAAGG 
27 01 ATGATGAAAC CTGGCCAGGC ACGGTGGCTC ACGCCTGTAA TCCCAGCACT 
27 51 TTGGGAAGCC GAGGCGGGTG GATCACGAGG TCAGGAGTTT GAGACCAGCC 
2801 TGGCCAACAT GGTGAAACCC CGTCTCTACA AAAATACAAA AATTAGCCGG 
2851 GCCTGGTGGC GCTAATCCCA GTTACTCGGG AGGCTGAGGC AGGAGAATCG 
2901 CTTGAACCCG GGAGGCGGAG GTTGCAGTGA GCCAAGATCG CACCACTGCA 
2951 CTCCAGCCTG GGCGACAGAG CAAGATTCTG TCTCAAAAAA AAAAAAAAAA 
3001 AAAAAAAAAA AGATGAAACC AAGTATACAA GCCCAGAAGC CTAGGGCTAA 
3051 TGGGACTGGA GTGCAAAAGG AAGAATTACT ATAAAATGGT GCTAGGGGCC 
3101 AGGCACGGTG GCTCACGCCT GTAATCCCAG CACTTTGGGA GGCCGAGGCG 
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3151 GGCGGATCAC GAGGTCAGGA 
3201 ATCACGTCTC TACTAAAAAC 
3251 GACTGTAGTC CCAGCTACTC 
3301 CCGGGAAGCA GAGCTTGCAG 
3351 CTGGGCGACA GAGCGAGACT 
34 01 TGCTAGGTAC TGTGACTGTG 
34 51 GGGAAAAGCT TTAAAGCTTA 
3501 AGAAATGGTA GAGCCCAGGT 
3551 ACAAACTTTG TCTCTAATCT 
3601 TTTGCAGTTG AGAAGGTTGA 
3651 TTACACAGCT ATGAAGTATC 
3701 ACTCCGAAGC AAGTGCTACA 
3751 TGTGCCTTGA GCTCCCTAAG 
3801 AGGCTCTGGC TGGGCTTTGA 
3851 GTTATTTCTC TCGCCGACGA 
3901 GTGGGCCTCT CCTCCTTCAC 
3951 CCACTACGCC TGGAGGGGGT 
4001 ACCTAGTGGC CTGTGGTGCT 
4 051 CCTGCTGTGG GTGGTCCCAG 
4101 CCCCTTCCTC CGTTACACTG 
4151 TCATTCCCTA CCTCCACCTG 
4201 CCACTACCTG CTGCCTTCCT 
4251 GGGGCGTGTG GTCTCCGGAT 
4301 CACGACTCCT GATGCTCTGG 
4 351 TTCCCTGTAG CTCAGGCTCC 
4 4 01 CGGCTTCACA TCAGGGGCTC 
4 451 AACTAATAGG GACTAGAAGG 
4 501 ATAGAGAGCA TCGGGGGGCT 
4 551 TGGGGTTCCC AGGGGGTGAG 
4 601 ACTATTCTCA TTACAGTGTA 
4 651 ACACAGCCTG CGTGGCCAAC 
4701 AAAGAACTTG GGGCTGGGAA 
47 51 GTGGAACCCT TGGCAGGGTG 
4801 AGAACCTGGC CAGACACAGA 
4851 GGCCTACTGG GCCCCAAACC 
4 901 CTGGCTCAGG GTGCCAGAAC 

4 951 ACTGAAGCTT TAGAAAGGCC 
5001 CTCAGGAGGC TGAGGCAGGA 
5051 CAGTGAGCTG AGATCGCGCC 
5101 ACTCCGTCTC AAAAAAAAAA 
5151 AAGGCACAAG TATGCCTGAC 
5201 GGTTTGGAGG TCCTTTCTGA 
5251 AGGCCCTTTT GGGAAACCAG 
5301 GAGTTCCAAC TCCTCTGAGA 
5351 TCTGAGCCCT CAGCCCCAGC 
54 01 CTCTTGGACC TAGGCTACCT 
5451 TTTTGTGGTG GCTGGGGCCT 
5501 CCCTGCCCCA CTTCTTCTGC 
5551 CTTGTAACAG AAGCACTAGA 
5601 GGAAGGAGGA CTGAACTCCA 

5 651 CAGCTCCAGG TCTTCTCTTG 
57 01 CCTTAAGATT CTTGATCTGC 
5751 AATGTGTGTG CCAACCCTTT 
5801 TACTCTCCTA ACCTTTTCTT 
5851 CTCTGTTGCC CAGGCTGGAG 
5901 CTCCGCTTCC CGGGTTCAAG 
5951 TGGGATTACA GGCGGGAGCC 
6001 TTTNNNNNNN NNNNNNNNNN 
6051 GTTTCACCAT GTTGGCCAGG 
6101 CCCCCCGCCC CTCCCTCGGC 
6151 CCACCACACC CAGCCTCCCC 
6201 AAGGATCCGG GAGTTCCTGC 
6251 CAAAGGCCAA GAGACTTATC 



GATCAAGACC ATCCTGGCTA ACACGGTGAA 
ACAAAAAATT AGCTGGGCGT GGTGGCAGGT 
GGGAGGCTGA GGCAGGAGAA TGGTGTGAAC 
TGAGCCGAGA TTGCACCACT GCACTCCAGC 
CCGTCTCAAA AAAAAAAAGA AAAAAAAAGG 
AAATCGATAT CATTATTGGA TTTACAGCTG 
TACAACTTGG CAAATGAAGG TCACACAGCT 
CTAACTCCAA AGTTCTGTGC TAGTTACCTT 
TCCACAATCC CAAAAAGTGT ATTATTACAT 
GGCTGGGGGT GTTAAGTAAA ACACACAAGG 
CAAGCCAAGA TTGTATCCCA GGTCTGTGGG 
TTCTGCTGCT GGGCAATGCG GGGATTACTG 
AGTTCTCAAC ACCACTTCTT CCTTTTTGAC 
CCTTCGCTCC GACCCTGGCC TGCCTGTCCT 
TCCCTGGCCA CCGGGCTGGC ACTGACAGGC 
ATTTGCCCCC TTTTTCCAGT GGCTGCTCAG 
CCCTGCTGCT GGTGTCTGCC CTCTCCCTCC 
CTCCTCCGCC CACCCTCCCT GGCTGAGGAC 
GGCCCAACTC ACCTCTCTCC TCCATCATGG 
TTGCCCTCAC CCTGATCAAC ACTGGCTACT 
GTGGCCCATC TCCAGGACCT GGATTGGGAC 
ACTCTCAGTT GTTGCTATTT CTGACCTCGT 
GGCTGGGAGA TGCAGTCCCA GGGCCTGTGA 
ACCACCTTGA CTGGGGTGTC ACTAGCCCTG 
CACAGCCCTG GTGGCTCTGG CTGTGGCCTA 
TGGCCCCACT GGCCTTCTCT GTGCTGCCTG 
ATTTACTGTG GCCTGGGACT GTTGCAGATG 
GCTGGGGCCT CCTCTCTCAG GTAAGTGGAA 
GGCTGCCATG TTGCACAACT AGGGGAGGGT 
TGTGAATATT GCCCTCTGGT GTAGTACAGT 
CATAGCATCC CTGAAATGGG TCCATGGGGC 
AGTCTGAGTG GAAAGACAAA AAGAAGCTAA 
CCTACGGCTT GGGTTTGCAG AGGACCTGGC 
CGTAGCATTC CAGTGTGCAC CCTTTCCTTT 
AGGTATCTGA GGCACCTGGT CAAAGTTCTG 
TTTCAGACCT TTATCTCCTC TTACCCATTA 
ACAGTTGGTG GGCGCCTGTA GTCCCAGCTA 
GAATGGCATG AACCCGGGAG GCGGAGCTTG 
ACTGCACTTC AGCCTGGGCG ACAGAGCGAG 
AAAAAAGAAA GGCCACAGTT GCCAGAAAGA 
TCAATCTGGA TCTCCAAATC CCTGCAGGCT 
AGGCGGGGAG GTGGTTGAAA TTAACTTTTG 
AGTTCTTAAG TTTATCCAAC TATTCCATGG 
TGATAAGTCT TCCCTCCACC CAAAAATGTA 
AAAT AG AT C A CTCATGTGTA TTCTTTTTCT 
CCGGGATGTG ACAGGCAACT ACACGGCTTC 
TCCTTCTTTC AGGGAGTGGC ATTCTCCTCA 
TTCTCAACTA CTACCTCCGG GCCCCAGGAC 
TACTAAAGTT CCCCTACCCA AGGAGGGACT 
CAGAGTCAGG CCCAGAAAGC CAAAGCTTGA 
CCACGTCTTG GTCTCCACAG AACCACAGTG 
CTCCCCCTAG AGCAGGCCTG GGGCTCCTGC 
GTATTTTGTT GAGGACTCTT ATTTCTCCGT 
CTTTTTTCTT TTTCCCGAGA CGGAGTCTTG 
TGCAGTGATG TGATCTCGGC TCACTGCAAC 
CGATTCTCCT GCCTCAGCCT CCCAAGTAGC 
ACCACACCCG GCTATTTTTT TTTTTTTTTT 
NNNNNNNNNN NNNNTTTTGG TAGAGACAGG 
ATGGTCTCGA ACTCCTGACC TTGTGATCCA 
CTTCCAAAGT GCTGGGATTA CAGGCGTGAG 
TAACCTTTTC TAAAGGACCC AGGAGTTTTG 
TTCACTGAGC TGTGAAT CAA CTGTGAAAAT 
ATGCTTTATA TAACATCTCT AGTGTTGCCT 
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6301 CCTGAGTTTC TTCTCTGAAG ACACATGTTT GGGAAACAAA ACTGTCCCTT 

6351 TGAGATAAAA TCAAATAAGA AAATTGGATA ATAATCACAA CCTCAAAATG 

64 01 AGCTGGGGCC CATATGCTTG GGTTGGCCGA ATGGAGTCAT GCCTGGAAGT 

6451 GGAGG AG AGT GTCCAGGAGC TCCGATGACC CAAGGCATTT TAACCCTGGA 

6501 ATCTGCTCTC CAGGCTACCA CCACATACCT CCCTCTTCCC CATTATCCCT 

6551 GTGGCTTAGA AAAGAA (SEQ ID NO: 3) 



FEATURES : 






Start: 


2026 




Exon : 


2026- 


-2224 


Intron: 


2225' 


-2369 


Exon : 


2370- 


-2513 


Intron: 


2514- 


-3802 


Exon : 


3803- 


-4540 


Intron: 


4541- 


-5413 


Exon: 


5414- 


-5703 


Stop: 


5704 





CHROMOSOME MAP POSITION: 

Chromosome 17 



ALLELIC VARIANTS (SNPs) : 

DNA Protein 

Position Major Minor Domain Position Major Minor 

423 G A Beyond ORF(5') 

2717 A G Intron 

3064 C T Intron 

4146 C A Exon 229 G G 

4440 T C Exon 327 S S 

4443 G T Exon 328 V V 

5105 T C Intron 



Context : 



DNA 

Position 

423 TAATAAAGTCAAGATTGGAACTGGGCCAGGCACGGTGGCTCACGCCTGTAATCCCAGCAC 
TTTGGGAGGCCAAGGCTGGTGGATCACTTGAGGTCAGGAGTTCGAGACCAGCGTGGCCAA 
CATGGTGAGACCTCGTCTCTACTAAAAATACCAAAATTAACTGGGCGTTGTGGTGGGAGC 
CTGTAATCCCAGAAACTCAGGAGACTGAGGCAGGAGAATCACTTGAACCCGGGAGGTGGA 
G GT T G C AGT GAG C C AAG AT CAT GC CAC T G C AC T CCAGCCTGGGC C AC AG AG C AAG AC T C C 
[G,A] 

TCTCAAAATAAATAAATAAATAAATAAATAAATAAAAGACTGGAACTGTGATCTGATTCT 
AAAG AC C C GAG T T C T T AAT CAC TAT G T AAT AC AG C CACAG C AAT T T C T GT AT CTTTGGCA 
TATTCCCCACCAGCCGACATTTTGACTCTTAGAAAGTATATATGTGTATTATTGATGATT 
ACTTTTATTTCCCACATATAAAATTATTTAAGGCTCAATATGTCTTTTAAGACTGCACAC 
CTCCCTCCCTGCCTCCACTTCTTGTTTGCTGCTTTCCCCAGTAATCTGGGAGTGAACATT 



2717 GTGATGACTGGAGGCATCTTGGCTGCGCTGGGGATGCTGCTCGCCTCTTTTGCTACTTCC 
TTGACCCACCTATACCTGAGTATTGGGTTGCTGTCAGGTGAGAGCCTGCACAAGGGCAGG 
AGAGTCAAATGCTTAGATCGTTGGATGTTCACCTCCTTCCTGCTCCTTCCAAAGGGTTCG 
GGGAGAAGCTGAGGGAAAGTTTAGCTAGCACCTGTACCCAGAAGGGAATTCTTAATAGGA 
ATGACTAAAGCGACAAACATGGTGAGGAATTAGGAAATTCAAGGATGATGAAACCTGGCC 
[A,G] 

GGCACGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAAGCCGAGGCGGGTGGATCACG 
AGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACAAAAATAC 
AAAAATTAGCCGGGCCTGGTGGCGCTAATCCCAGTTACTCGGGAGGCTGAGGCAGGAGAA 
TCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCAAGATCGCACCACTGCACTCCAGC 
CT GGG C G ACAGAG C AAG AT T C T GT CT CAAAAAAAAAAAAAAAAAAAAAAAAAAAG AT G AA 
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3064 GCGGGTGGATCACGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCCGT 
CTCTACAAAAATACAAAAATTAGCCGGGCCTGGTGGCGCTAATCCCAGTTACTCGGGAGG 
CTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCAAGATCGCAC 
CACTGCACTCCAGCCTGGGCGACAGAGCAAGATTCTGTCTCAAAAAAAAAAAAAAAAAAA 
AAAAAAAAGATGAAACCAAGTATACAAGCCCAGAAGCCTAGGGCTAATGGGACTGGAGTG 
[C,T] 

AAAAGGAAGAATTACTATAAAATGGTGCTAGGGGCCAGGCACGGTGGCTCACGCCTGTAA 
TCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGGAGATCAAGACCATCC 
TGGCTAACACGGTGAAATCACGTCTCTACTAAAAACACAAAAAATTAGCTGGGCGTGGTG 
GCAGGTGACTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGTGTGAACCCGG 
GAAGCAGAGCTTGCAGTGAGCCGAGATTGCACCACTGCACTCCAGCCTGGGCGACAGAGC 

414 6 GTCCTGTTATTTCTCTCGCCGACGATCCCTGGCCACCGGGCTGGCACTGACAGGCGTGGG 
CCTCTCCTCCTTCACATTTGCCCCCTTTTTCCAGTGGCTGCTCAGCCACTACGCCTGGAG 
GGGGTCCCTGCTGCTGGTGTCTGCCCTCTCCCTCCACCTAGTGGCCTGTGGTGCTCTCCT 
CCGCCCACCCTCCCTGGCTGAGGACCCTGCTGTGGGTGGTCCCAGGGCCCAACTCACCTC 
TCTCCTCCATCATGGCCCCTTCCTCCGTTACACTGTTGCCCTCACCCTGATCAACACTGG 
EC, A] 

TACTTCATTCCCTACCTCCACCTGGTGGCCCATCTCCAGGACCTGGATTGGGACCCACTA 
CCTGCTGCCTTCCTACTCTCAGTTGTTGCTATTTCTGACCTCGTGGGGCGTGTGGTCTCC 
GGATGGCTGGGAGATGCAGTCCCAGGGCCTGTGACACGACTCCTGATGCTCTGGACCACC 
TTGACTGGGGTGTCACTAGCCCTGTTCCCTGTAGCTCAGGCTCCCACAGCCCTGGTGGCT 
CTGGCTGTGGCCTACGGCTTCACATCAGGGGCTCTGGCCCCACTGGCCTTCTCTGTGCTG 

4440 CACTGGCTACTTCATTCCCTACCTCCACCTGGTGGCCCATCTCCAGGACCTGGATTGGGA 
CCCACTACCTGCTGCCTTCCTACTCTCAGTTGTTGCTATTTCTGACCTCGTGGGGCGTGT 
GGTCTCCGGATGGCTGGGAGATGCAGTCCCAGGGCCTGTGACACGACTCCTGATGCTCTG 
GACCACCTTGACTGGGGTGTCACTAGCCCTGTTCCCTGTAGCTCAGGCTCCCACAGCCCT 
GGTGGCTCTGGCTGTGGCCTACGGCTTCACATCAGGGGCTCTGGCCCCACTGGCCTTCTC 
[T,C] 

GTGCTGCCTGAACTAATAGGGACTAGAAGGATTTACTGTGGCCTGGGACTGTTGCAGATG 
ATAGAGAGCATCGGGGGGCTGCTGGGGCCTCCTCTCTCAGGTAAGTGGAATGGGGTTCCC 
AGGGGGTGAGGGCTGCCATGTTGCACAACTAGGGGAGGGTACTATTCTCATTACAGTGTA 
TGTGAATATTGCCCTCTGGTGTAGTACAGTACACAGCCTGCGTGGCCAACCATAGCATCC 
CTGAAATGGGTCCATGGGGCAAAGAACTTGGGGCTGGGAAAGTCTGAGTGGAAAGACAAA 

4443 TGGCTACTTCATTCCCTACCTCCACCTGGTGGCCCATCTCCAGGACCTGGATTGGGACCC 
ACTACCTGCTGCCTTCCTACTCTCAGTTGTTGCTATTTCTGACCTCGTGGGGCGTGTGGT 
CTCCGGATGGCTGGGAGATGCAGTCCCAGGGCCTGTGACACGACTCCTGATGCTCTGGAC 
CACCTTGACTGGGGTGTCACTAGCCCTGTTCCCTGTAGCTCAGGCTCCCACAGCCCTGGT 
GGCTCTGGCTGTGGCCTACGGCTTCACATCAGGGGCTCTGGCCCCACTGGCCTTCTCTGT 
[G,T] 

CTGCCTGAACTAATAGGGACTAGAAGGATTTACTGTGGCCTGGGACTGTTGCAGATGATA 
GAGAGCATCGGGGGGCTGCTGGGGCCTCCTCTCTCAGGTAAGTGGAATGGGGTTCCCAGG 
GGGTGAGGGCTGCCATGTTGCACAACTAGGGGAGGGTACTATTCTCATTACAGTGTATGT 
GAATATTGCCCTCTGGTGTAGTACAGTACACAGCCTGCGTGGCCAACCATAGCATCCCTG 
AAATGGGTCCATGGGGCAAAGAACTTGGGGCTGGGAAAGTCTGAGTGGAAAGACAAAAAG 

5105 CCTGGCCAGACACAGACGTAGCATTCCAGTGTGCACCCTTTCCTTTGGCCTACTGGGCCC 
CAAACCAGGTATCTGAGGCACCTGGTCAAAGTTCTGCTGGCTCAGGGTGCCAGAACTTTC 
AGACCTTTATCTCCTCTTACCCATTAACTGAAGCTTTAGAAAGGCCACAGTTGGTGGGCG 
CCTGTAGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGAATGGCATGAACCCGGGAGGCGG 
AGCTTGCAGTGAGCTGAGATCGCGCCACTGCACTTCAGCCTGGGCGACAGAGCGAGACTC 
[T,C] 

GTCTCAAAAAAAAAAAAAAAAGAAAGGCCACAGTTGCCAGAAAGAAAGGCACAAGTATGC 
CTGACTCAATCTGGATCTCCAAATCCCTGCAGGCTGGTTTGGAGGTCCTTTCTGAAGGCG 
GGGAGGTGGTTGAAATTAACTTTTGAGGCCCTTTTGGGAAACCAGAGTTCTTAAGTTTAT 
CCAACTATTCCATGGGAGTTCCAACTCCTCTGAGATGATAAGTCTTCCCTCCACCCAAAA 
ATGTATCTGAGCCCTCAGCCCCAGCAAATAGATCACTCATGTGTATTCTTTTTCTCTCTT 
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